-
Notifications
You must be signed in to change notification settings - Fork 10.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
server : add Hermes-3 tool call support (WIP) #9254
base: master
Are you sure you want to change the base?
Conversation
I have a suggestion. You can grab tool call action by tool token ids instead of token string. In my own "Python implementation", if start tool call token id is generated(<|python_tag|> in case of llama 3.1) then streaming is paused until stop token/end_tool_call token is generated. Then a recursive call is made with the output of the tool and streaming is resumed. |
@qnixsynapse Yes it's possible to do so with Hermes-3 format, but that will not be possible with either llama 3.1 JSON tool calls or llama 3.1 custom function. The goal here is to make it compatible with OAI specs, so relying on Anyway, I'll consider doing this later on, when tool call templates are more mainstream and patterns start to emerge. |
@ngxson The Here for example, we can expand this: else if (has_token("[/INST]") && has_token("[TOOL_CALLS]")) {
return LLAMA_TOOL_FORMAT_MISTRAL;
} Regarding streaming, this is sufficient I think: upload.mp4 |
@ngxson, I appreciate your work, the new feature is great. EDIT: Also the cases in which the "tools" parameter in the /completion request is null, or an empty array, should be accepted and treated the same as the case in which such parameter does not exist at all. |
ss << "<|im_start|>system\n\n"; | ||
ss << "You are a function calling AI model. You are provided with function signatures within <tools></tools> XML tags. You may call one or more functions to assist with the user query. Don't make assumptions about what values to plug into functions. Here are the available tools: <tools>\n\n"; | ||
for (auto tool : tools) { | ||
ss << tool.dump(1, '\t') << "\n\n"; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why the tabulations? They are increasing the number of tokens, but I think they do not provide useful information.
ss << tool.dump(1, '\t') << "\n\n"; | |
ss << tool.dump() << "\n\n"; |
Related to #5695
Close #9031
This is still WIP.
What is working:
tools
via/chat/completion
stream
==> currently works for non-tool responseSpecial thanks to @Rocketknight1 for his very detailed blog post: Tool Use, Unified